Sound resynthesis from Auditory Mellin Image using STRAIGHT
نویسندگان
چکیده
We propose an Auditory VOCODER to resynthesize sound from the Auditory Mellin Image which is an auditory representation that segregates the size and shape information of incoming sound. The sound resynthesis part consists of three techniques: the STRAIGHT VOCODER [2], frequency-warping cepstral analysis [4,12], and nonlinear multivariate regression analysis (MRA). We explain these methods and the evaluation of the system. The initial listening tests indicate that the sound quality is reasonable. The auditory components enhance the noise suppression and stream segregation performance during speech processing.
منابع مشابه
An Auditory Vocoder Resynthesis of Speech from an Auditory Mellin Representation
An auditory Mellin transform has been proposed to segregate information about the size and shape of the vocal tract automatically; the process is also independent of glottal pitch. In this paper, we describe a method for resynthesizing speech from the Mellin representation using a high quality vocoder (STRAIGHT), and a nonlinear function to map between the two representations of speech. This en...
متن کاملStabilised wavelet mellin transform: an auditory strategy for normalising sound-source size
We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...
متن کاملExtracting Size and Shape Information of Sound Source in an Optimal Auditory Processing Model
We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...
متن کاملSegregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform
We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal t...
متن کاملMulti-frame Super Resolution for Improving Vehicle Licence Plate Recognition
License plate recognition (LPR) by digital image processing, which is widely used in traffic monitor and control, is one of the most important goals in Intelligent Transportation System (ITS). In real ITS, the resolution of input images are not very high since technology challenges and cost of high resolution cameras. However, when the license plate image is taken at low resolution, the license...
متن کامل